Unsupervised False Friend Disambiguation Using Contextual Word Clusters and Parallel Word Alignments
نویسندگان
چکیده
Lexical false friends (FF) are the phenomena where words that look the same, do not have the same meaning or lexical usage. FF impose several challenges to statistical machine translation. We present a methodology which exploits word context modeling as well as information provided by word alignments for identifying false friends and choosing the right sense for them in the context. We show that our approach enhances SMT lexical choice for false friends across language variants. We demonstrate that our approach reduces word error rate (WER) and position independent error rate (PER) for Egyptian-English SMT by 0.6% and 0.1% compared to the baseline.
منابع مشابه
Translation-oriented Word Sense Induction Based on Parallel Corpora
Word Sense Disambiguation (WSD) is an intermediate task that serves as a means to an end defined by the application in which it is to be used. However, different applications have varying disambiguation needs which should have an impact on the choice of the method and of the sense inventory used. The tendency towards application-oriented WSD becomes more and more evident, mostly because of the ...
متن کاملContextual Modeling for Meeting Translation Using Unsupervised Word Sense Disambiguation
In this paper we investigate the challenges of applying statistical machine translation to meeting conversations, with a particular view towards analyzing the importance of modeling contextual factors such as the larger discourse context and topic/domain information on translation performance. We describe the collection of a small corpus of parallel meeting data, the development of a statistica...
متن کاملDisambiguation of partial cognates
Cognates – words that have similar spelling and meaning in two or more languages – can accelerate vocabulary acquisition and facilitate the reading comprehension task. A student has to pay attention to the pairs of words that look and sound similar but have different meanings – false-friend pairs, and especially to pairs of words that share meanings in some but not all contexts – partial cognat...
متن کاملUnsupervised Sense Disambiguation Using Bilingual Probabilistic Models
We describe two probabilistic models for unsupervised word-sense disambiguation using parallel corpora. The first model, which we call the Sense model, builds on the work of Diab and Resnik (2002) that uses both parallel text and a sense inventory for the target language, and recasts their approach in a probabilistic framework. The second model, which we call the Concept model, is a hierarchica...
متن کاملUMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness
In this paper we describe an unsupervised WordNet-based Word Sense Disambiguation system, which participated (as UMND1) in the SemEval-2007 Coarsegrained English Lexical Sample task. The system disambiguates a target word by using WordNet-based measures of semantic relatedness to find the sense of the word that is semantically most strongly related to the senses of the words in the context of t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015